{
    "cells": [
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/tensorflow/tensorflow_inference_bf16.ipynb)"
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "# Use BFloat16 Mixed Precision for TensorFlow Keras Inference"
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "Brain Floating Point Format (BFloat16) is a custom 16-bit floating point format designed for machine learning. BFloat16 is comprised of 1 sign bit, 8 exponent bits, and 7 mantissa bits. With the same number of exponent bits, BFloat16 has the same dynamic range as FP32, but requires only half the memory usage.\n",
       "\n",
       "BFloat16 Mixed Precison combines BFloat16 and FP32 during training and inference, which could lead to increased performance and reduced memory usage. Compared to FP16 mixed precision, BFloat16 mixed precision has better numerical stability.\n",
       "\n",
       "When conducting BF16 mixed precision inference on CPU, it could be a common case that the model is pretrained in FP32. With the help of `InferenceOptimizer.quantize(..., precision='bf16')` API, you could conduct BF16 mixed precsion inference on a FP32 pretrained model with a few lines of code."
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {
       "nbsphinx": "hidden"
      },
      "source": [
       "First, you need to install BigDL-Nano for Tensorflow inference:"
      ]
     },
     {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
       "nbsphinx": "hidden"
      },
      "outputs": [],
      "source": [
       "!pip install --pre --upgrade bigdl-nano[tensorflow,inference] # install the nightly-built version\n",
       "!source bigdl-nano-init # set environment variables"
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {
       "nbsphinx": "hidden"
      },
      "source": [
       "> 📝 Note\n",
       ">\n",
       "> We recommend to run the commands above, especially `source bigdl-nano-init` before jupyter kernel is started, or some of the optimizations may not take effect."
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "> ⚠️ Warning\n",
       ">\n",
       "> BigDL-Nano will enable intel’s oneDNN optimizations by default. oneDNN BFloat16 are only supported on platforms with AVX512 instruction set.\n",
       ">\n",
       "> Platforms without hardware acceleration for BFloat16 could lead to bad BFloat16 inference performance."
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "Let's take a MobileNetV2 Keras model pretained on ImageNet dataset as an example. It is clear that the model here is in FP32."
      ]
     },
     {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
       "from tensorflow import keras\n",
       "\n",
       "fp32_model = keras.applications.MobileNetV2(weights=\"imagenet\")"
      ]
     },
     {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [
       {
        "name": "stdout",
        "output_type": "stream",
        "text": [
         "The model's dtype policy is float32\n"
        ]
       }
      ],
      "source": [
       "print(f\"The model's dtype policy is {fp32_model.dtype_policy.name}\")"
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "## Without Extra Accelertor"
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "To conduct BF16 mixed preision inference, one approach is to convert the layers (except for the input one) in the Keras model to have `mixed_bfloat16` as their dtype policy. To achieve this, you could simply **import BigDL-Nano** `InferenceOptimizer`**, and quantize your model without extra accelerator**:"
      ]
     },
     {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
       "from bigdl.nano.tf.keras import InferenceOptimizer\n",
       "\n",
       "bf16_model = InferenceOptimizer.quantize(fp32_model, precision='bf16')"
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "> 📝 Note\n",
       ">\n",
       "> Currently, Nano does not support BF16 quantization without extra accelerator on a custom Keras model (e.g. inherited from `tf.keras.Model`).\n",
       ">\n",
       "> If you want to quantize a custom Keras model in BF16, you could try to set `accelerator='openvino'` together with `precision='bf16'` instead. See the next section for more information."
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "## With Extra Accelerator"
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "You could also conduct BF16 mixed precision inference with OpenVINO at the mean time as the accelerator. To achieve this, you could simply **import BigDL-Nano** `InferenceOptimizer`**, and quantize your model with** `accelerator='openvino'`:"
      ]
     },
     {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
       "from bigdl.nano.tf.keras import InferenceOptimizer\n",
       "\n",
       "bf16_ov_model = InferenceOptimizer.quantize(fp32_model, \n",
       "                                            precision='bf16',\n",
       "                                            accelerator='openvino')"
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "> 📝 Note\n",
       ">\n",
       "> Please note that, when you have a custom model to quantize (e.g. inherited from `tf.keras.Model`), you need to specify the `input_spec` parameter to let OpenVINO accelerator know the shape of the model input.\n",
       ">\n",
       "> Please refer to [API documentation](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Nano/tensorflow.html#bigdl.nano.tf.keras.InferenceOptimizer.quantize) for more information on `InferenceOptimizer.quantize`."
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "After quantizing your model with or without extra accelerator, you could then conduct BF16 mixed precision inference as normal:"
      ]
     },
     {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [
       {
        "name": "stdout",
        "output_type": "stream",
        "text": [
         "The time for 100 iterations of FP32 inference is: 20.940536975860596 s\n",
         "The time for 100 iterations of BF16 mixed precision inference is: 16.39808487892151 s\n",
         "The time for 100 iterations of BF16 mixed precision inference with OpenVINO is: 5.174031972885132 s\n"
        ]
       }
      ],
      "source": [
       "import numpy as np\n",
       "import time\n",
       "\n",
       "test_data = np.random.rand(32, 224, 224, 3)\n",
       "\n",
       "# FP32 inference\n",
       "st1 = time.time()\n",
       "for _ in range(100):\n",
       "    fp32_model(test_data)\n",
       "print(f'The time for 100 iterations of FP32 inference is: {time.time() - st1} s')\n",
       "\n",
       "# BF16 mixed precision inference\n",
       "st2 = time.time()\n",
       "for _ in range(100):\n",
       "    bf16_model(test_data)\n",
       "print(f'The time for 100 iterations of BF16 mixed precision inference is: {time.time() - st2} s')\n",
       "\n",
       "# BF16 mixed precision inference with OpenVINO\n",
       "st3 = time.time()\n",
       "for _ in range(100):\n",
       "    bf16_ov_model(test_data)\n",
       "print(f'The time for 100 iterations of BF16 mixed precision inference with OpenVINO is: {time.time() - st3} s')"
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "> 📚 **Related Readings**\n",
       "> \n",
       "> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/install.html)\n",
       "> - [How to install BigDL-Nano in Google Colab](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/Install/install_in_colab.html)"
      ]
     }
    ],
    "metadata": {
     "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
     },
     "language_info": {
      "codemirror_mode": {
       "name": "ipython",
       "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.16"
     },
     "vscode": {
      "interpreter": {
       "hash": "f6c4fac624a9bd3b1c7bcafb358e36fcd9daccaa962ba059d07bbc89607fe634"
      }
     }
    },
    "nbformat": 4,
    "nbformat_minor": 2
   }
   